Linear modes of gene expression determined by independent component analysis
نویسنده
چکیده
MOTIVATION The expression of genes is controlled by specific combinations of cellular variables. We applied Independent Component Analysis (ICA) to gene expression data, deriving a linear model based on hidden variables, which we term 'expression modes'. The expression of each gene is a linear function of the expression modes, where, according to the ICA model, the linear influences of different modes show a minimal statistical dependence, and their distributions deviate sharply from the normal distribution. RESULTS Studying cell cycle-related gene expression in yeast, we found that the dominant expression modes could be related to distinct biological functions, such as phases of the cell cycle or the mating response. Analysis of human lymphocytes revealed modes that were related to characteristic differences between cell types. With both data sets, the linear influences of the dominant modes showed distributions with large tails, indicating the existence of specifically up- and downregulated target genes. The expression modes and their influences can be used to visualize the samples and genes in low-dimensional spaces. A projection to expression modes helps to highlight particular biological functions, to reduce noise, and to compress the data in a biologically sensible way.
منابع مشابه
Independent arrays or independent time courses for gene expression time series data analysis
In this paper we apply three different independent component analysis (ICA) methods, including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA), to gene expression time series data and compare their performance in clustering genes and in finding biologically meaningful modes. Up to now, only spatial ICA was applied to gene expression data analysis. However, in the case of...
متن کاملMembership Scoring via Independent Feature Subspace Analysis for Grouping Co-Expressed Genes
Linear decomposition models such as principal component analysis (PCA) and independent component analysis (ICA) were shown to be useful in analyzing high dimensional DNA microarray data, compared to clustering methods. Assuming that gene expression is controlled by a linear combination of uncorrelated/indepdendent latent variables, linear modes were shown to be related to some biological functi...
متن کاملEffect of Oxidized Low Density Lipoprotein on the Expression of Runx2 and SPARC Genes in Vascular Smooth Muscle Cells
Background: Vascular calcification is an important stage in atherosclerosis. During this stage, vascular smooth muscle cells (VSMC) synthesize many osteogenic factors such as osteonectin (encoded by SPARC). Oxidative stress plays a critical role in atherosclerosis progression, and its accumulation in the vascular wall stimulates the development of atherosclerosis and vascular calcification. The...
متن کاملLinear and Nonlinear Multivariate Classification of Iranian Bottled Mineral Waters According to Their Elemental Content Determined by ICP-OES
The combinations of inductively coupled plasma-optical emission spectrometry (ICP-OES) and three classification algorithms, i.e., partial least squares discriminant analysis (PLS-DA), least squares support vector machine (LS-SVM) and soft independent modeling of class analogies (SIMCA), for discriminating different brands of Iranian bottled mineral waters, were explored. ICP-OES was used for th...
متن کامل0570 TITLE : Endocrine Therapy of Breast Cancer
Background: Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 18 1 شماره
صفحات -
تاریخ انتشار 2002